Encoding standards for large text resources: The Text Encoding Initiative

نویسنده

  • Nancy Ide
چکیده

The Text Encoding Initiative (TEl) is an international project established in 1988 to develop guidelines for the preparation and interchange of electronic texts for research, and to satisfy a broad range of uses by the language industries more generally. The need for standardized encoding practices has become inxreasingly critical as the need to use and, most importantly, reuse vast amounts of electronic text has dramatically increased for both research and industry, in particular for natural language processing. In January 1994, the TEl isstled its Guidelines for the Fmcoding and hiterehange of Machine-Readable Texts, which provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representation schemes for language data: the Text Encoding Initiative and its potential impact for encoding African languages

The Text Encoding Initiative (TEI)Guidelines for the Encoding and Interchange of Machine-Readable Texts provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications. Given the potential challenges of encoding texts in the African languages, it will be important to establish collaboration between the TEI and projects encoding l...

متن کامل

A Corpus of Textual Revisions in Second Language Writing

This paper describes the creation of the first large-scale corpus containing drafts and final versions of essays written by non-native speakers, with the sentences aligned across different versions. Furthermore, the sentences in the drafts are annotated with comments from teachers. The corpus is intended to support research on textual revision by language learners, and how it is influenced by f...

متن کامل

Treating metadata as annotations: separating the content markup from the content

The use of digital learning resources creates an increasing need for semantic metadata, describing the whole resource, as well as parts of resources. Traditionally, schemas such as Text Encoding Initiative (TEI) have been used to add semantic markup for parts of resources. This is not sufficient for use in a ”metadata ecology”, where metadata is distributed, coherent to different Application Pr...

متن کامل

TEI P5 as an XML Standard for Treebank Encoding∗

The aim of the paper is to show that a subset of Text Encoding Initiative Guidelines is a reasonable choice as a standard for stand-off XML encoding of syntactically annotated corpora. The proposed TEI schema — actually employed in the National Corpus of Polish — is compared to other such candidate standards, including TIGER-XML, SynAF and PAULA.

متن کامل

Lessons learned from using SGML in the Text Encoding Initiative

In April of 1994 the ACH-ALLC-ACL Text Encoding Initiative published Guidelines for Electronic Text Encoding and Interchange (Document TEI P3). SGML was used as the basis for the encoding scheme that was developed. Several innovative approaches to the use of SGML were devised during the course of the project. Three aspects of this innovation are documented in the paper. First, all of the tags a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994